The stimulated cells part of the Cano-Gamez et al. data contains T cells that were activated by anti-Cd3/CD28 beads. Cytokines were added to force the development of subtypes.
| Annotation ‘cytokine.condition’ | TCR activation / anti-CD3/anti-CD28 beads | Added Cytokines |
|---|---|---|
| UNS | No | None |
| Th0 | Yes | None |
| Th2 | Yes | IL4, anti-IFN-\gamma |
| Th17 | Yes | IL-6, IL-23, IL-1\beta, TGF-\beta, anti-IL-4, anti-IFN- \gamma |
| iTreg | Yes | TGF-\beta, IL-2 |
Loading the necessary libraries
import scanpy as sc
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# settings can be adapted individually
sc.settings.verbosity = 3
sc.logging.print_header()
sc.settings.set_figure_params(dpi = 100, format = 'png')
scanpy==1.7.2 anndata==0.7.6 umap==0.4.6 numpy==1.20.1 scipy==1.6.2 pandas==1.2.4 scikit-learn==0.24.2 statsmodels==0.12.2 python-igraph==0.9.1 louvain==0.7.0
Load preprocessed scRNA-seq data
See notebook "Data preprocessing" for this analysis part"
canogamez = sc.read_h5ad("result_files/canogamez_preprocessing.h5ad") # change to your data path
# create a path to store the preprocessed file
results_file = '/canogamez_stim.h5ad' # change to your data path
Separate the data in stimulated and UNS cells
# select all UNS cells
resting = canogamez.obs['cytokine.condition'] == 'UNS'
# invert to get stimulated cells
stimulated = np.invert(resting)
# create AnnData consisting only of stimulated cells
canogamez_act = canogamez[stimulated,:]
sc.tl.pca(canogamez_act, svd_solver = 'arpack')
computing PCA
on highly variable genes
with n_comps=50
finished (0:00:02)
fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize = (20,4), gridspec_kw = {'wspace':1})
ax1_dict = sc.pl.pca(canogamez_act, color = 'cell.type', ax = ax1, show = False)
ax2_dict = sc.pl.pca(canogamez_act, color = 'cytokine.condition', ax = ax2, show = False)
ax3_dict = sc.pl.pca(canogamez_act, color = 'donor.id', ax = ax3, show = False)
sc.pl.pca(canogamez_act, color = 'cytokine.condition', projection = '3d',annotate_var_explained = True)
sc.pl.pca(canogamez_act, color = 'cytokine.condition', components = ['1,2','3,4','5,6','7,8'], ncols = 2)
sc.pl.pca_variance_ratio(canogamez_act, log = True)
No clear ellbow. Only small proportion of variance explained.
sc.pp.neighbors(canogamez_act, n_neighbors = 10, n_pcs = 40)
sc.tl.umap(canogamez_act)
computing neighbors
using 'X_pca' with n_pcs = 40
finished: added to `.uns['neighbors']`
`.obsp['distances']`, distances for each pair of neighbors
`.obsp['connectivities']`, weighted adjacency matrix (0:00:14)
computing UMAP
finished: added
'X_umap', UMAP coordinates (adata.obsm) (0:00:14)
sc.pl.umap(canogamez_act, color = ['cytokine.condition','cell.type', 'donor.id'], wspace = 0.5)
The Louvain algorthim was chosen for clustering as it was used by Cano-Gamez et al.
! Caution: Rerunning your code will change the cluster composition due to randomness of the algorthim !
sc.tl.louvain(canogamez_act, key_added = "louvain_1.0", random_state=1)
running Louvain clustering
using the "louvain" package of Traag (2017)
finished: found 17 clusters and added
'louvain_1.0', the cluster labels (adata.obs, categorical) (0:00:05)
sc.pl.umap(canogamez_act, color=['louvain_1.0', 'cytokine.condition','cell.type', 'donor.id'], wspace=0.5)
Identified 17 louvain clusters, same number as in the paper by Cano-Gamez et al.
def count_pie(anndata, clustering, category):
"""generates a data frame with counts for a specific category within
the clusters and, plots values as pie chart"""
# generate data frame with information for cluster
clusters_df = anndata.obs[str(clustering)].to_frame()
clusters_df[str(category)] = anndata.obs[str(category)]
# generate empty dataframe for counted values
number_clusters = len(np.unique(anndata.obs[str(clustering)]))
row_names = list(np.unique(anndata.obs[str(clustering)]))
row_names_long = ['cluster ' + name for name in row_names]
col_names = list(anndata.obs[str(category)].cat.categories)
df_cell_count = pd.DataFrame(0, columns=col_names,
index=row_names_long)
# fill dataframe with counts of the given categorie
for i in range(0, number_clusters):
cluster = clusters_df[str(clustering)] == str(i)
cells_cluster = clusters_df[cluster]
count_cells = cells_cluster.value_counts()
for ic, vc in count_cells.items():
df_cell_count.at['cluster ' + ic[0], ic[1]] = vc
# plot as piechart
from natsort import natsorted
df_cell_count_T = df_cell_count.T
df_cell_count_T
df_cell_count_T.reindex(natsorted(df_cell_count_T.columns, ), axis=1)
amount_plots = len(df_cell_count)
amount_cols = 4
amount_rows = int(np.ceil(amount_plots / amount_cols))
fig, axes = plt.subplots(nrows=amount_rows, ncols=amount_cols,
figsize=(15, 15))
fig.tight_layout()
for index, column in enumerate(df_cell_count_T):
current_ax = axes[index // amount_cols, index % amount_cols]
current_ax.set_title('{}'.format(column))
current_data = df_cell_count_T[column]
current_labels = list(current_data.index)
current_data = list(current_data)
current_ax.pie(current_data, labels=current_labels,
autopct='%1.1f%%', startangle=90)
current_ax.axis('equal')
return df_cell_count, plt.show()
count_pie(canogamez_act, 'louvain_1.0', 'cell.type')
( Memory Naive cluster 0 5186 134 cluster 1 403 4261 cluster 10 1288 240 cluster 11 476 613 cluster 12 987 30 cluster 13 475 531 cluster 14 347 221 cluster 15 140 207 cluster 16 20 3 cluster 2 2997 1400 cluster 3 189 4192 cluster 4 1476 2684 cluster 5 1976 56 cluster 6 20 1907 cluster 7 1630 218 cluster 8 151 1625 cluster 9 1281 459, None)
count_pie(canogamez_act, 'louvain_1.0', 'cytokine.condition')
( Th0 Th2 Th17 iTreg cluster 0 326 145 2248 2601 cluster 1 139 5 3029 1491 cluster 10 295 195 460 578 cluster 11 229 170 362 328 cluster 12 354 160 285 218 cluster 13 11 0 499 496 cluster 14 151 142 154 121 cluster 15 74 90 112 71 cluster 16 4 1 10 8 cluster 2 957 790 1448 1202 cluster 3 501 202 886 2792 cluster 4 833 3222 46 59 cluster 5 1196 677 69 90 cluster 6 77 13 240 1597 cluster 7 81 47 866 854 cluster 8 1440 64 122 150 cluster 9 641 1010 32 57, None)
All clusters are a mixture of cytokine cell types. Most of them with high proportion of Th17 and iTreg.
sc.tl.rank_genes_groups(canogamez_act, groupby = 'louvain_1.0', method = 'wilcoxon', use_raw = True)
ranking genes
finished: added to `.uns['rank_genes_groups']`
'names', sorted np.recarray to be indexed by group ids
'scores', sorted np.recarray to be indexed by group ids
'logfoldchanges', sorted np.recarray to be indexed by group ids
'pvals', sorted np.recarray to be indexed by group ids
'pvals_adj', sorted np.recarray to be indexed by group ids (0:01:06)
sc.pl.rank_genes_groups_matrixplot(canogamez_act, n_genes = 3, cmap = 'bwr', standard_scale = 'var',
values_to_plot = 'scores')
Use identified genes to annotate IFN high, HSP high and Mitotic clusters:
Used marker genes
The marker genes from literature are based on the following website: https://www.biocompare.com/Editorial-Articles/569888-A-Guide-to-T-Cell-Markers/
| T cell type / differentiation state | Marker genes mentioned in Cano-Gamez et al. | |
|---|---|---|
| Tn/Th2 | GATA3, MAOA, LIMA1, MRPS26 | |
| Tn/Th17 | TNFRSF8, PALLD, RORA | |
| Tn/iTreg | FOXP3, LMCD1, LGALS3, CCL5 | |
| Tn/Th17/iTreg | IL2, DUSP2, TNF | |
| Tm/Th17/iTreg | CCL5, LGALS3, TNFRSF8, BACH2, BATF3, AHR, IL17F, CTLA4 | |
| central memory T cells (Tcm) | PASK | |
| effector memory T cells (Tem) | IL7R, KLRB1, TNFSF13B | |
| terminally differentiated effector cells (TEMRA) | PRF1, CCL4, GZMA, GZMH | |
| natural T regulatory cells (nTreg) | FOXP3, CTLA4 |
| T cell type / differentiation state | Marker genes literature | |
|---|---|---|
| T naive (Tn) | CCR7 | |
| central memory T cells (Tcm) | FAS, IL2RB, PRDM1 | |
| effector memory T cells (Tem) | CXCR3, ITGAL, CCR5, TBX21 |
Map marker genes on clusters
Tn/Th2
sc.pl.umap(canogamez_act, color = ['louvain_1.0', 'GATA3', 'MAOA', 'LIMA1', 'MRPS26' ], legend_loc = 'on data',
wspace = 0.5)
-> cluster 4
Tn/Th17
sc.pl.umap(canogamez_act, color = ['louvain_1.0', 'TNFRSF8', 'PALLD', 'RORA'], legend_loc = 'on data',
wspace = 0.5)
-> cluster 15, cluster 1
Tn/iTreg
sc.pl.umap(canogamez_act, color = ['louvain_1.0','IL2', 'DUSP2', 'TNF'], legend_loc = 'on data',
wspace = 0.5)
-> cluster 6
Tn/Th17/iTreg
sc.pl.umap(canogamez_act, color = ['louvain_1.0','IL2', 'DUSP2', 'TNF'], legend_loc = 'on data', wspace = 0.5)
-> cluster 15, cluster 1
Tm/Th17/iTreg
sc.pl.umap(canogamez_act, color = ['louvain_1.0','CCL5', 'LGALS3', 'TNFRSF8', 'BACH2', 'BATF3', 'AHR',
'IL17F', 'CTLA4'], legend_loc = 'on data', wspace = 0.5)
Tcm
sc.pl.umap(canogamez_act, color = ['louvain_1.0','FAS', 'IL2RB', 'PRDM1', 'PASK'], legend_loc = 'on data',
wspace = 0.5)
Tem
sc.pl.umap(canogamez_act, color = ['louvain_1.0', 'CXCR3', 'ITGAL', 'CCR5', 'TBX21', 'IL7R', 'KLRB1', 'TNFSF13B'],
legend_loc = 'on data', wspace = 0.5)
TEMRA
sc.pl.umap(canogamez_act, color = ['louvain_1.0','PRF1', 'CCL4', 'GZMA', 'GZMH'], legend_loc = 'on data',
wspace = 0.5)
nTreg
sc.pl.umap(canogamez_act, color=['louvain_1.0','FOXP3', 'CTLA4'], legend_loc='on data',
wspace=0.5)
Add annotation to AnnData
Create annotation
The annotation is based on marker genes and composition of the clusters. Since an attempt was made to identify the same clusters as by Cano-Gamez et al., some annotations may have little correspondence with the observations made above.
# adjust to individually identified clusters
cluster_annotation = {
'8':'Th0/Tn',
'3':'Th0/Tcm1',
'16':'Th0/Tcm2',
'9':'Th0/Tem',
'5':'Th0/Temra',
'15':'Th17/iTreg/Tn',
'13':'Th17/iTreg/Tcm1',
'7':'Th17/iTreg/Tcm2',
'0':'Th17/iTreg/Tem',
'10':'Th17/iTreg/Temra',
'6':'iTreg/Tn',
'1':'Th17/iTreg/Tn',
'4':'Th2/Tn',
'12':'nTreg',
'11':'IFN high',
'2':'HSP high',
'14':'Mitotic'
}
Add annotation to data
canogamez_act.obs['cell type'] = canogamez_act.obs['louvain_1.0'].map(cluster_annotation).astype('category')
Plotting
sc.pl.umap(canogamez_act, color='cell type', legend_loc='on data',
frameon=False, legend_fontsize=5)
Annotation does not match the one in the paper (see: https://cytokines.cellgeni.sanger.ac.uk/simulated). Take the output from different factor analyis algorithms (Factor_Analysis_Methods.ipynb) for alternative clustering.
canogamez_act.write(results_file)